Associative Arithmetic with Boltzmann Machines: The Role of Number Representations
نویسندگان
چکیده
This paper presents a study on associative mental arithmetic with mean-field Boltzmann Machines. We examined the role of number representations, showing theoretically and experimentally that cardinal number representations (e.g., numerosity) are superior to symbolic and ordinal representations w.r.t. learnability and cognitive plausibility. Only the network trained on numerosities exhibited the problem-size effect, the core phenomenon in human behavioral studies. These results urge a reevaluation of current cognitive models of mental arithmetic. 1 Simple Mental Arithmetic Research on mental number processing has revealed a specific substrate for number representations in the inferior parietal cortex of the human brain, where numbers are thought to be encoded in a number-line (NL) format (Fig.2a,b) [1]. Simple mental arithmetic, however, is thought to be based on an associative network storing arithmetic facts in a verbal (symbolic) format [2]. Psychometric studies show that in production or verification of single-digit arithmetic problems (e.g., addition or multiplication), reaction times (RTs) and errors increase as a function of the size of the problem (problem-size effect) [3]. Problem size can be indexed by some function of the two operands, such as their sum or the square of their sum. The latter is the best predictor of the reaction time data for simple addition problems [4]. The first connectionist attempt to model simple multiplication was based on the autoassociative Brain-State-in-a-Box (BSB) network [5]. Numbers were represented jointly as NL and symbolic codes. Learning performance was far from optimal, in spite of the fact that the problem was simplified to computing “approximate” multiplication (to reduce the computational load). A later model of simple multiplication was MATHNET [6]. This model used NL representations and was implemented with a Boltzmann Machine (BM) [7]. The network was exposed to the arithmetic problems according to a schedule that roughly followed the experience of children when learning arithmetic, i.e., facts with small operands came before larger facts. However, fact frequency was manipulated in a way that did not reflect the real distribution: small facts were presented up to seven times as often as large problems. Therefore, MATHNET exhibited a (weak) problem-size effect, which was entirely produced by the specific training schedule and by the implausible frequency manipulation. In the present study, we used mean-field BMs trained with the contrastive divergence learning algorithm [8] to model the simple addition task and to contrast three different hypotheses about the representation of numbers. We found that numerositybased representations facilitate learning and provide the best match to human reaction times. We conclude that the traditional view of symbolic mental arithmetic should be reevaluated and that number representations for arithmetic should incorporate the basic property of cardinal meaning. Figure 1. BMs for Mental Arithmetic. Patterns are encoded at the visible layer. To recall a fact, its two arguments are fixed at the visible layer and the network iterates until convergence. 2 Mean-Field BMs with Contrastive Divergence Learning In line with previous connectionist attempts to model mental arithmetic, we assume that arithmetic facts are learned and stored in an associative NN. One typical associative network is the Boltzmann Machine, consisting of binary neurons with stochastic dynamics, fully connected with symmetric weights that store correlations of activations between connected units [7]. Data are encoded by visible neurons, whereas hidden neurons capture high-order statistics (Fig.1). BMs recall stored patterns by synchronous or asynchronous iterations, starting from initial activations representing partial patterns. Learning derives from the probabilistic law governing the net freerunning state and targets a weight set W resulting in free-running distribution PBM(s) similar to the data distribution PBM(s) = Q(s). The learning procedure minimizes the Kullback-Liebler divergence between the distributions at time zero and equilibrium by computing derivatives w.r.t. the weights and applying gradient descent: ∆wij = η ( ) (1) The update of a weight connecting two units is proportional to the difference between the average of the correlations between these two units, computed at time 0 (positive, or fixed phase) and after reconstructing the pattern (negative, or free-running phase). Since the stochastic BM is computationally intractable, [9] replaced the correlations with mean field approximation: = mi mj , where mi is the mean field activity of neuron i and is given by the solution of a set of n coupled mean-field equations (2). Such an approximation turns the stochastic BM into a discrete NN since we can operate entirely with mean-field values, which also allows graded values. mi = σ ( Σj wij mj + θj) (2) Hinton [10] replaced the correlations computed in the free-running phase with correlations computed after one-step data reconstruction (contrastive divergence learning), which was shown also to drive the weights toward a state in which the data will be reproduced according to their distribution. This was followed by the fast Contrastive Divergence Mean-Field learning (3) [8], that we use for our simulations. ∆wij = η ( mi mj mi mj) (3) To learn a set of patterns, the learning algorithm presents them to the network in batches. For each pattern, the network performs a positive (wake) phase, when only the hidden layer settles, and a negative (sleep) phase, in which the network further reconstructs the visible pattern and then once again settles the hidden layer. After each phase, statistics for the corellations between the activations of each pair of connected neurons is collected. The weights can either be updated after each pattern or at the end of a batch. Here we used batch learning. The network recalls patterns by initializing the visible layer with a part of a pattern and iterating by consequent updating of the hidden and the visible layer until convergence. The number of steps to converge corresponds to RTs. We used the unsupervised learning mode, which in the sleep phase Visible Layer 1 Operand 1110000000000000 Hidden layer
منابع مشابه
Arithmetic Teichmuller Theory
By Grothedieck's Anabelian conjectures, Galois representations landing in outer automorphism group of the algebraic fundamental group which are associated to hyperbolic smooth curves defined over number fields encode all arithmetic information of these curves. The goal of this paper is to develope and arithmetic teichmuller theory, by which we mean, introducing arithmetic objects summarizing th...
متن کاملUsing both Binary and Residue Representations for Achieving Fast Converters in RNS
In this paper, a new method is introduced for improving the efficiency of the Residue Number System, which uses both binary and residue representations in order to represent a number. A residue number system uses the remainder of the division in several different modules. Conversion of a number to smaller ones and carrying out parallel calculations on these numbers greatly increase the speed of...
متن کاملUsing both Binary and Residue Representations for Achieving Fast Converters in RNS
In this paper, a new method is introduced for improving the efficiency of the Residue Number System, which uses both binary and residue representations in order to represent a number. A residue number system uses the remainder of the division in several different modules. Conversion of a number to smaller ones and carrying out parallel calculations on these numbers greatly increase the speed of...
متن کاملGender Concept “Woman” in the Minds of the Russian People (Taking the Chinese as Reference) According to an Associative Experiment
The article is devoted to the study of language representations of the concept of “woman” in the minds of the Russian and Chinese people based on a comparison of associative experiments of two languages, identifying the dynamics of the concept in the language consciousness of the people, establishing the specificity of the concept in the Russian language picture of the world referring to the Ch...
متن کاملDeformation of Outer Representations of Galois Group
To a hyperbolic smooth curve defined over a number-field one naturally associates an "anabelian" representation of the absolute Galois group of the base field landing in outer automorphism group of the algebraic fundamental group. In this paper, we introduce several deformation problems for Lie-algebra versions of the above representation and show that, this way we get a richer structure than t...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2002